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This paper descibes a new method for deriving incidence rates of a chronic 
disease from prevalence data. It is based on a new ordinary differential 
equation, which relates the change in the age-specific prevalence to the age- 
specific incidence and mortality rates. The method allows the extraction 
of longtudinal information from cross-sectional studies. Applicability of the 
method is tested in the prevalence of dementia in Germany. The derived 
age-specific incidence is in good agreement with published values. 
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1 Introduction 

Basic epidemiological characteristics of a disease are the prevalence, the proportion of 
diseased persons in the population, and the incidence, which focusses on the number 
of new cases. Both characteristics are fundamentally different: the first measures the 
actual presence of the disease, the second refers to the new cases. Typically, prevalence 
and incidence of a disease are surveyed in observational studies. The prevalence can 
easily be assessed in cross-sectional studies: The study population is interviewed or 
examined with respect to the disease. The classical approach to measure incidence is 
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the cohort study, which is somewhat more complex. A certain group of patients is 
examined whether the disease exists at the start of the study. The healthy individuals 
of the group will be examined at least once more at a later point in time, to find out 
whether the disease occured in the meanwhile. Since at least one follow-up investigation 
must take place, a cohort study mostly is much more complex and expensive than a 
cross-sectional study. Particular difficulties arise due the fact that participants get lost 
after the baseline examination (loss to follow-up). 

For some questions, the incidence of a disease is more important than knowing the 
number of those who are already ill. Many of the questions in health services research, 
such as the allocation of resources need information about the number of expected pa- 
tients. Within epidemiology, there are several attempts to derive incidences from preva- 
lence data. A simple, popular example may illustrate this. Consider a closed population 
of size ./V; this means there is no migration and the numbers of births and deaths are 
exactly the same for the considered period of length At > 0. Let C denote the number 
of persons in the population who suffer from a chronic disease (C stands for cases). 
Assuming that the number of diseased persons for the time period is constant, it follows 
that the number of new cases is just equal to the number of patients who die. Hence, it 
holds 

(N-C)-i-At = C-m 1 -At, 

where i is the incidence rate and mi is the mortality of the diseased^. By defining the 
(overall) prevalence p := jj, the term N c _ c can be expressed as N c _ c = which is 
called prevalence odds. Since the inverse of the mortality m\ is the mean duration d of 
the disease, it follows: 

P -.d. 



1-p 

This corresponds to the often found statement that th e prevalence odds equa ls the 



product of incidence and disease duration (see for example (jSzklo and Nietd . 120071 )). For 
rare diseases (1 — p ~ 1) this reads as: prevalence equals the product of incidence and 
duration. 

Beside this simple example, a number of more co mplex ap p roach es exist to estimate 



the incidence from prevalence data. The article by iLangohrl (|1999l ) gives an overview. 
This work reports about a new method, which is based on a simple compartment model 
and uses an ordinary differential equation (ODE) to express transitions between the com- 
partments. Com partment models in e pi demiology go back at least until the early 1990s 



(see for example (jKeiding et al.l . ll990l )). Murray and Loped from the Harvard Center for 



x For later use we denote the mortality of the healthy and the diseased with mo and mi, respectively. 
The subindex dichotomizes the presence of the disease. 
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Population and Development Studies describ e that they express the transitions between 
the compartments in terms of ODEs (j 19941 ). Without quoting Keiding, they call their 
model Harvard Incidence- Prevalence Model. Unfortunately, they do not describe their 
equations. Later they publish anot her (slightly more complic ated) compartment model 
and present the associated ODEs, ([Murray and Lopea . Il99a ). Our approach builds on 
the original model of Keiding and a two-dimensional system of ODEs. By analyti- 
cal transformations this two-dimensional system can be reduced to an one-dimensional 
ODE. The reduced equation to our knowledge has not yet been published by other 
groups. We take this equation further to derive the incidence rate from prevalence data. 
In contrast to the multi-dimensional system of Murray and Lopez, the one-dimensional 
equation can be solved easily for the incidence. 

This paper is organized as follows: Section [2] describes the newly discovered link be- 
tween the age distributions of the prevalence, incidence and mortality rates. In Section 
[3] the new method is applied to data of the statutory health insurance (SHI) in Ger- 
many. The age distribution of persons with a diagnosis of dementia is used to derive the 
incidence rates in the associated age groups. Finally, the results are discussed in section 

m 



2 The new relation beween incidence, prevalence and mortality 

Compartment models are widespread in medicine and other sciences, (|Godfrev! . [l983h . 



In epidemi ology of infectious di s eases they play a prominent role with a variety of ap- 
plications, ([Keeling and Rohanil . 120071 ) . A simple model for the study of non-infectious 
diseases is shown in Figure [TJ Three stat es Normal, Disease and Death are c onsid ered, 
plus the transitions between the states, (jKeidingl . Il99ll : iMurrav and Loped . [1994J). In 
general, the transition rates depend on the calendar time t (sometimes called the period) 
and the age a. Henceforth, only irreversible diseases are considered. The transition from 
the state Disease to the state Death often depends on the duration d of the disease. 
The influence of calendar time reflects, for example, the change in mortality or medical 
progress over decades. 

As shown in Figure [1] people in the population get the disease with incidence rate i. 
The mortality rate depends on the state: Non-diseased and diseased persons die with 
rates mo and mj, respectively. Mostly, the rate mi will be higher than the rate of mo- 
For historical reasons, the numbers of individuals in the states Normal and Disease are 
denoted S (susceptibles) and C (cases), respectively. 
Henceforth, we need the following assumptions: 

1. The rates i, mo, and mi do not depend on calendar time t, 

2. the mortality rate mi of the diseased does not depend on the duration d, 
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Figure 1: Simple model of a chronic disease with three states. Persons in the state 
Normal are healthy with respect to the considered disease. In the state Disease 
they suffer from the disease. The transition rates depend on the calendar time 
t, on the age a, and in case of the disease-specific mortality mi also on the 
disease's duration d. 

3. the population is closed (i.e. there is no migration), 

4. the birth-rates of new-borns with and without the disease are constant over time. 

Furthermore, let us assume that the changes in S and C are proportional to the 
differences of the in- and outflows to and from the compartments: 



dS_ 

da 
dC 

da 



i(a) ■ S — mi (a) • C. 



(i(a) + roo(o)) • S 




With these assumptions the central result of this work can be formulated: 
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Theorem 2.1. Let mortalities m,mo G C° ([0, oo)) and S, C G C 1 ([0, oo)) wrai/i 5(a) 
C(a) > /or all a G [0, oo) , then the age-specific prevalence 



C 



S + C 



is differ entiable in [0, oo) and it holds 

dp 



da 



(1 -p) ■ (i -p ■ (mi - m )) • 



(2) 



(3) 



Proof. This is an easy application of the quotient rule to Eq. ([2]) using Eq. ([I]). □ 

Depending on what information is given about the mortalities, the ODE ([3|) changes 
its type (see Table [1]). Note that the overall mortality m in the population can be 
expressed as 

m(a) = (1 — p(a)) ■ mo(a) + p(a) ■ mi(a). (4) 



Furthermore, in some cases the relative mortality risk R(a) 



mi (a) 
m (a) 



is known. 



Table 1: Type of the ODE ©. 



Known mortality 


Ri 


2ht-hand side 


Type of the ODE 


m, mo 


(1 


— p) ■ (i — (m — mo)) 


Linear 


m ,mi 


(1 


-p) ■ (i-p - (mi - m )) 


Riccati 


mo, R 


(1 


-p) ■ (i -p • m • (R - 1)) 


Riccati 


mi,R 


(1 


— p) ■ (i — p ■ mi • (1 — l/R)) 


Riccati 


m, mi 


(1 




Abel 


m, R 


(1 


-p)-(i-m - (l - (p • (i? - 1) + I)" 1 )) 


Abel 



If the ODE is the linear, it ca n be solyed an alytically. If it is Riccatian or Abelian, 
a general solution is not known, (|Kamkd . Il983l ). Since the overall mortality m in many 
populations can be obtained by official life tables; and relative mortality risks R for 
several diseases often are reported in epidemiological studies, the most important case 
is when m and R are given. The following section will show an application of this. 



Note, that independence from calendar time t, zero migration and constant birth 
rates are crucial for Eq. ([1]). This can be seen by realizing that the population size 
N(a) = S(a) + C(a) at age a fulfills the following equation: 

dN _ dS | dC 
da da da 

= —mo ■ S — mi • C 

= —N ■ [(1 — p) ■ mo + p ■ mi] . 
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Thus, by using E q. flU) it holds ^ff = —m- N, which is the defining equation of a station- 



being stationary, (IPreston et all l200ll . pp. 53ff). 



ary population, (IPreston and Coald. I1982T ). These assumptions ensure the population 



3 Application to dementia 



As already described in the introduction, the most important application is the derivation 
of the age-specific incidence rate i(a) from the age distribution p[a) of the prevalence of 
a chronic disease. The usefulness of the method is examined i n an example of dementia- 
Preva lence of dementia in Germany is reported in the work of IZiegler and Doblhammer 
(2009). Basis for the values published there were claims data from the German statutory 
health insurance (SHI) in the year 2002. About 90 percent of the whole population in 
Germany are members of the SHI. A three percent random sample of all of these is 
used for the analysis. Hence, information of more than 2.3 million people are taken into 
account. 

For each of the persons in the three-percent sample, demographic data (age, gender), 
the number of docto r visits and hospital stays , both with diagnostic positions in ICD 
coding, are included. IZiegler and Doblhammeri associate the following ICD-10-GM diag- 
noses with dementia: F00, F01, F02, F03, G30. The resulting prevalences in Germany 
are reported in Table 



Table 2: Prevalence of dementia in members of the German 



SHI in 2002 IZiegler and Doblhammeri (|2009l ) 



Age group (in years) 


Females (%) 


Males (%) 


60-64 


0.6 


0.8 


65-69 


1.3 


1.5 


70-74 


3.1 


3.2 


75-79 


6.8 


5.6 


80-84 


12.8 


10.3 


85-89 


23.1 


17.9 


90-94 


31.3 


24.2 



The prevalence data show that dementia is m ore frequent in women age d > 75 years 



than in men in same age group. Unfortunately IZiegler and Doblhammeri have not re- 
ported confidence intervals or p-values to decide whether the differences in the age groups 
are significant. Due to the large sample size this is likely. 

For the application of the one-dimensional ODE ([3]) we need statements about the 
mortality. Here, we use the general mortality m as surveyed by the Federal Statistical 
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Office of German y, The relative mortality R of persons with dementia can be found in 



(jRait et al.l . |2010| ): In the first year after the diagnosis of the disease, it is about 3.7 and 
in subsequent years about 2.4. In this work, the relative mortality is set to be constant 
at 12(a) = 2.4. 

In order to derive the incidence rate i(a) from Eq. ©, the following steps are per- 
formed: 

1. Derive a spline function s(a) that interpolates the prevalence data. 

2. Calculate the derivative g| and define the function 

ds/da 

3. The incidence rate i(a) can be expressed by 

i(a) = c(a) + m(a) • (l - 0(a) • (12(a) - 1) + 1) _1 ) . 

The spline is used to transform the discrete values of the prevalence data into a differen- 
tiable function. Here, the uniquely defined cubic spline with natural bounding conditions 
that interpolates the prevalence data is chosen. It is two times differentiable. Calcu- 
lations are performed with the statistical software R (The R Foundation for Statistical 
Computing), version 2.12.0. 

Using the prevalence data as shown in Table [2] as input values, following results are 
obtained with the algorithm described above (Table [3]). 



Table 3: Age-specific 


incidence rates for dementia 


as calculated with the new 


method. 






Age group 


Females 


Males 


(in years) 


(per 100 person-years) 


(per 100 person-years) 


60-64 


0.1 


0.1 


65-69 


0.2 


0.3 


70-74 


0.6 


0.5 


75-79 


1.2 


1.1 


80-84 


2.9 


2.6 


85-89 


5.4 


4.8 


90-94 


9.7 


8.4 
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4 Discussion 



This paper descibes a new method for deriving incidence rates of a chronic disease 
from prevalence data. It relies on a simple compartment model with three states and 
transitions between these. With the assumptions that the transition rates just depend 
on age a and the population is stationary, a one-dimensional ODE relates the change in 
the age-specific prevalence to the incidence and mortality rates. After the age stratified 
prevalence data is transformed into a differentiable form, the ODE can be solved for the 
age-specific incidence rate. For this purpose, the natural cubic interpolation spline is 
used. So far, this choice is arbitrary, there might be better ones. 

While the incidence in the system ([1]) can only be extracted with sophisticated meth- 
ods (for example with a restricted optimization), the approach based on the ODE ([3]) 
and the spline is considerably less computationally intensive. Computation time can be 
a problem, because typically the prevalence data are fraught with errors and a sensi- 
tivity analysis should be performed. In such sensitivity analyses, many (thousands) of 
constellations of the input data (i.e. the prevalence in the age groups) are generated and 
the changes in the result (age-specific incidence rates) are monitored. In a validation 
study of the method treating data from dialysis patients, 1500 optimizations took more 
than six hours on an AMD Quad-Core PC with 2.6 GHz. 



Because the data of the SHI used for (jZierier and Doblhammerl . 120091 ) covers a period 



of one year, four reporting periods (quarters) are spanned. Based on this, the authors 
try to estimate the incidence rate, too. When a member of t he SHI gets a diagnosis of 



dementia in second or third quarter but not in the first quarter. IZiegler and Doblhammer 
consider this as a potentially new case. To avoid false positives (dementia in the early 
stage is difficult to be seen), only those cases from the potentially new cases are finally 
taken into account, in which the fourth quarter also contained a diagnosis of dementia. 
Using this method, the authors report the incidence rates as shown in Table [H For 
comparison the values of the new ODE method are shown in brackets. 

Compared with the values of our method, the results in Table [Hare, up to few excep- 
tions, higher. There are two possible reasons: 



1. 



Ziegler and Doblhammerl overestimate the incidence by their method, because preva- 
lent cases with no doctor visit in the first quarter count as a newly incident case. 
This is very likely, since incidence estimates relying on one disease free quarter 
only are very prone to overestimations. For a recent work reflecting on this, see 



(jAbbas et al 



2011 



2. On the other hand, it may be that our values are systematically too small. One 
reason might be the increased relative mortality in the first year after diagnosis of 
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Table 4: Age-specific incidence rates, (jZiegler and Doblhammerl . l2009i . Tab. 3). Val- 
ues in brackets are the results of our method (cf. Tab. [3]). 



Age group 


Females 


Males 


(in years) 


(per 100 person-years) 


(per 100 person-years) 


65-69 


0.3 (0.2) 


0.3 (0.3) 


70-74 


0.8 (0.6) 


0.7 (0.5) 


75-79 


1.8 (1.2) 


1.7 (1.1) 


80-84 


3.5 (2.9) 


3.0 (2.6) 


85-89 


6.9 (5.4) 


5.2 (4.8) 


90-94 


9.7 (9.7) 


7.6 (8.4) 



dementia. Instead of the measured value R = 3.7, here R = 2.4 is used. Hence, 
our relative risk of death is too low, which manifests in an underestimation of 
the incid ence. However, in comparison with the age-specific incidences in other 
studies, (jZiegler and Doblhammerl . 120091 . Fig. 3), our values are in a very good 
agreement. 
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